Relevant Data
Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. Relevance To produce an effective solution, it is crucial that the data you are putting into a computer for processing is free of error. Data that has no bearing on the decisions to be made or that does not provide useful information to be shared can be considered irrelevant. Information produced from incorrect or irrelevant data is considered an unreliable source on which to base a decision. Therefore, it is important to ensure that the data to be used to develop a solution is of the highest quality before being input. Suitability The data to be entered into a computer must be in a suitable format. When data is being collected or prepared for input, it must be checked to ensure that it is of the correct data type. For example, a casual employee’s wage may need to be calculated and recorded using a spreadsheet. This requires entering the hours worked and multiplying them by the hourly pay rate. The hours worked must be expressed as a number data type, such as the number 40, as a mathematical calculation needs to take place. An error will occur if text, such as ‘forty’, or alphanumerical data, such as ‘40 hours’, are entered. Note too that if the calculation involved addition, the incorrectly entered data would be ignored. The spreadsheet would complete the sum using the rest of the data, which would result in an incorrect wage without triggering an error message. Reliability The development of information technology and digital communications has made it easier for people around the globe to communicate their views and present information in a format that is easily accessible to others. In particular, the World Wide Web (or the Web) is now awash with personal websites, homemade videos, wikis, podcasts, vodcasts and a plethora of unchecked information. These forms of output are often highly valuable sources of data, but they also provide a facile means for individuals to showcase viewpoints and modes of behaviour that are not widely accepted or proven to be accurate. The collector of data must ensure that the sources – and therefore the data – are reliable. Unreliable sources of data will result in incorrect or unproven information that could cause inappropriate decisions to be made. Accuracy Data that is entered into a computer must be accurate. Transcription is often a cause for error. Transcription errors occur when the person entering the data misreads the information through, for example, a lapse in concentration, being interrupted or pressing the wrong key. It is easy to make a mistake when entering a large amount of data, particularly numbers with many digits that may not contain spaces or punctuation to signify thousands. Clearly, if the data collected is incorrect, the information produced will be incorrect. If data has been gathered from a primary source, it is a good idea to check it against this. If data has been gathered from a secondary source and is suspect, it is worthwhile verifying the data using other secondary sources. Timeliness Data must be current – that is, timely – to produce usable information.Data needs to be processed while it is current because decision-making should not be based on outdated data. For example, a decision to take out a mortgage should not be based on interest rates that are more than a week old. Alternatively, if the government wishes to check whether or not a new policy has public support, it could survey a sample of people. If the poll is conducted on the day of the announcement, there is a strong possibility that many of the people surveyed would not have heard about the new policy and thus may not have read it thoroughly or thought through the implications. The survey data gathered then would be flawed because it was gathered too early and hence the findings may prove to be inaccurate Freedom from bias Bias can easily creep into data and make the information processed from it unreliable. Several influences can result in the introduction of bias into data: namely, vested interest, timing, small sample size, bias through sorting and bias through graphic representations. Vested interest Bias can enter data if the respondent to a survey or interview has a vested interest in the outcome of the research. A common example is celebrities who are paid to promote particular products in commercials. It would be unreasonable to trust their statements that one product is better than others purely based on the fact that they are celebrities; they are only saying what they have been paid to say and may not necessarily be providing an independent judgement that has been derived from research or experience. Timing The timing of the data collection may also introduce bias. For example, you plan to survey a sample of the population for their views about Australia becoming a republic. The data you gather may be biased if, just prior to the survey being conducted, a royal tour takes place and there is extensive media coverage about the royal family. The timing of the data collection would introduce bias because it coincides with a significant event that distorts the responses away from those that would be provided at a different time. Note too that bias is not restricted to data gathered from surveys or during interviews. For example, suppose that Qantas needed to decide whether to schedule two new weekly flights from Paris. The decision could depend on the demand for existing fl ights. If the airline collected data from bookings made over a four-week period just before or during the soccer World Cup to be held in Melbourne, the data gathered would be biased. Such data should not be relied on for making this decision because the influence of this event on customer demand is irregular and unlikely to occur again. Small sample size Choosing a sample size that is too small may also incur bias. The sample size must relate to the purpose of the data collection and, generally, a larger sample size leads to greater precision. It must be big enough so that any conclusions drawn and information produced are credible. For example, if you wanted to determine whether or not the school uniform should be changed, it would be remiss to only survey students in your class. Not only would this sample not be representative of the student body, but it would also not include other stakeholders, such as parents and school administrators. Similarly, if you wanted to gather sales data over a four-day period to predict monthly sales at a fish and chip shop, this would be insufficient for decision-making. For instance, by choosing the four Mondays in the month, you may be selecting the quietest trading days in the week. If you pick the four Fridays in the month, you may be picking the busiest trading days. When selecting a sample size, you need to ensure that it is representative of the whole population. Bias through sorting The way in which you sort lists can introduce bias, although frequently this is unavoidable. A classroom teacher often consults a class list that is sorted alphabetically, such as to select students for special tasks. The list is biased towards students whose surnames appear early in the alphabet and thus at the top of the class list. If you need to hire an electrician and consult a paperbased telephone book or an online directory, it is more likely that you will pick an early entry than one from the second page of listings. Bias of this type is difficult to avoid, so it is preferable to educate the user to recognise that the output has built-in bias and to encourage strategies to overcome that bias. Bias through graphic representations Bias can occur through your choice of graphic type, scale used and size chosen. Graphic representations should be sized proportionally to avoid overstating or trivialising the importance of one of the variables involved.